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Abstract 

We consider the problem of minimizing a function, which is the sum of a linear func¬ 
tion and a composition of a strongly convex function with a linear transformation, over 
a compact polyhedral set. Jaggi and Lacoste-Julien m showed that the conditional 
gradient method with away steps employed on the aforementioned problem without 
the additional linear term has linear rate of convergence, depending on the so-called 
pyramidal width of the feasible set. We revisit this result and provide a variant of 
the algorithm and an analysis that is based on simple duality arguments, as well as 
corresponding error bounds. This new analysis (a) enables the incorporation of the 
additional linear term, (b) does not require a linear-oracle that outputs an extreme 
point of the linear mapping of the feasible set and (c) depends on a new constant, 
termed “the vertex-facet distance constant”, which is explicitly expressed in terms of 
the problem’s parameters and the geometry of the feasible set. This constant replaces 
the pyramidal width, which is difficult to evaluate. 


1 Introduction 

Consider the minimization problem 

min {/(x) = 5 (Ex) -b (b,x)} , (p) 

where X C R"' is a compact polyhedral set, E G b G R” and g : R”* —>■ R is strongly 

convex and continuously differentiable over R”*. Note that for a general matrix E, the 
function / is not necessarily strongly convex. 

‘Faculty of Industrial Engineering and Management, Technion - Israel Institute of Technology, Haifa, 
Israel. Email: becka@ie.technion.ac.il. 

tFaculty of Industrial Engineering and Management, Technion - Israel Institute of Technology, Haifa, 
Israel. Email: shimrits@tx.technion.ac.il. 


1 



When the problem at hand is large-scale, first order methods, which have relatively low 
computational cost per iteration, are usually utilized. These methods include, for exam¬ 
ple, the class of projected (proximal) gradient methods. A drawback of these methods is 
that under general convexity assumptions, they posses only a sublinear rate of convergence 
mm, while linear rate of convergence can be established only under additional conditions 
such as strong convexity of the objective function m- Luo and Tseng m showed that the 
strong convexity assumption can be relaxed and replaced by an assumption on the exis¬ 
tence of a local error bound, and under this assumption, certain classes algorithms, which 
they referred to as “feasible descent methods”, converge in an asymptotic linear time. The 
model ([P]) with assumptions on strong convexity of g, compactness and polyhedrality of 
X was shown in m to satisfy the error bound. In |19j Wang and Lin extended the work 
m and showed that there exists a global error bound for problem ([P]) with the additional 
assumption of compactness of X] and derived the exact linear rate for this case. We note 
that the family of “feasible descent methods” include the block alternating minimization 
algorithm (under the assumption of block strong convexity), as well as gradient projec¬ 
tion methods, and therefore are usually at least as complex as evaluating the orthogonal 
projection operator onto the feasible set X at each iteration. 

An alternative to algorithms which are based on projection (or proximal) operators are 
linear-oracle-hased algorithms such as the conditional gradient (CG) method. The CG 
algorithm was presented by Frank and Wolfe in 1956 [8], for minimizing a convex function 
over a compact polyhedral set. At each iteration, the algorithm requires a solution to the 
problem of minimizing a linear objective function over the feasible set. It is assumed that 
this solution is obtained by a call to a linear-oracle, i.e., a black box which, given a linear 
function, returns an optimal solution of this linear function over the feasible set (see an 
exact definition in Section 12.,ip . In some instances, and specifically for certain types of 
polyhedral sets, obtaining such a linear-oracle can be done more efficiently than computing 
the orthogonal projection onto the feasible set (see examples in [9]), and therefore the CG 
algorithm has an advantage over projection-based algorithms. The original paper of Frank 
and Wolfe also contained a proof of an 0{l/k) rate of convergence of the function values 
to the optimal value. Levitin and Polyak showed in m that this 0{l/k) rate can also 
be extended to the case where the feasible set is a general compact convex set. Cannon 
and Culum proved in [5] that this rate is in fact tight. However, if in addition to strong 
convexity of the objective function, the optimal solution is in the interior of the feasible 
set, then linear rate of convergence of the CG method can be establishec0 [TT]. Epelman 
and Freund [7]j as well as Beck and Teboulle [I] showed a linear rate of convergence of the 
conditional gradient with a special stepsize choice in the context of finding a point in the 
intersection of an affine space and a closed and convex set under a Slater-type assumption. 
Another setting in which linear rate of convergence can be derived is when the feasible set 


^The paper m assumes that the feasible set is a bounded polyhedral, but the proof is actually correct 
for general compact convex sets. 
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is uniformly (strongly) convex and the norm of the gradient of the objective function is 
bounded away from zero m- 

Another approach for deriving a linear rate of convergence is to modify the algorithm. 
For example, Kazan and Garber used local linear-oracles in [9] in order to show linear rate 
of convergence of a “localized” version of the conditional gradient method. A different 
modification, which is viable when the feasible set is a compact polyhedral, is to use a 
variation of the conditional gradient method that incorporates away steps. This version 
of the conditional gradient method, which we refer to as away steps conditional gradient 
(ASCG), was initially suggested by Wolfe in |20] and then studied by Guelat and Marcotte 
where a linear rate of convergence was established under the assumption that the 
objective function is strongly convex, as well as an assumption on the location of the 
optimal solution. In m Jaggi and Lacoste-Julien were able to extend this result for the 
more general model ([P]) for the case where b = 0, without restrictions on the location 
of the solution. We note that the ASCG requires that the linear-oracle will produce an 
optimal solution of the associated problem which is an extreme point. We will call such an 
oracle a vertex linear-oracle (see the discussion in Section [3T]) . 

Contribution. In this work, our starting point and main motivation are the results of 
Jaggi and Lacoste-Julien M- Our contribution is threefold: 

(a) We extend the results given in [13] and show that the ASCG algorithm converges 
linearly for the general case of problem ([P|), that is, for any value of E and b. 

The additional linear term (b, x) enables us to consider much more general models. 
For example, consider the /i-regularized least squares problem minx6s{||Bx — c||^ -|- 
A||x||i}, where S C M”- is a compact polyhedral, B G G and A > 0. Since 

S is compact, we can find a constant M > 0 for which ||x||i < M for any x G S'. We 
can now rewrite the model as 

112 

min ||Bx — c|| -|- \y, 

xSS,||x||i<y,3/S[0,M] 

which obviously fits the general model ([P|) 

(b) The analysis in |14] assumes the existence of a vertex linear-oracle on the set EX, 
rather than an oracle for the set X. This fact is not significant for the “pure” CG 
algorithm, since it only requires a linear-oracle and not a vertex linear-oracle. This 
means that for the CG algorithm, a linear-oracle on EX can be easily obtained by 
applying E on the output of the linear-oracle on X. On the other hand, this argument 
fails for the ASCG algorithm that specifically requires the oracle to return an extreme 
point of the feasible set, and finding such a vertex linear-oracle on EX might be a 
complex task , see Section [3.11 for more details. Our analysis only requires a vertex 
linear-oracle on the original set X. 

(c) We present an analysis based on simple duality arguments, which are completely 
different than the geometric arguments in m- Consequently, we obtain a computable 
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constant for the rate of convergence, which is explicitly expressed as a function of 
the problem’s parameters and the geometry of the feasible set. This constant, which 
we call “the vertex-facet distance constant”, replaces the so-called pyramidal width 
constant from [H], which reflects the geometry of the feasible set and is obtained as 
the optimal value of a very complex mixed integer saddle point optimization problem 
whose exact value is unknown even for simple polyhedral sets. 


Paper layout. The paper is organized as follows. Section [2] presents some preliminary 
results and definitions needed for the analysis. In particular, it provides a brief introduction 
to the classical CG algorithm and linear oracles. Section [3] presents the ASCG algorithm 
and the convergence analysis, and is divided into four subsections. In Section 13.11 the 
concept of vertex linear-oracle, needed for the implementation of ASCG, is presented, 
and the difficulties of obtaining a vertex linear-oracle on a linear transformation of the 
feasible set are discussed. In Section 13.21 we present the ASCG method with different 
possible stepsize choices. In Section 13.3t we provide the rate of convergence analysis of 
the ASCG for problem ([P]), and present the new vertex-facet distance constant used in the 
analysis. Finally, in Section 13.41 we demonstrate how to compute this new constant for a 
few examples of simple polyhedral sets. 

Notations. We denote the cardinality of set I by |/|. The difference, union and 
intersection of two given sets I and J are denoted by I/J = {aGl:a^J}, I U J and 
/ n J respectively. Subscript indices represent elements of a vector, while superscript 
indices represent iterates of the vector, i.e., Xi is the zth element of vector x, is a vector 
at iteration k, and x^ is the rth element of x^. The vector e* E M” is the fth vector 
of the standard basis of M"", 0 € M"" is the all-zeros vector, and 1 E M” is the vector of 
all ones. Given two vectors x, y E M”, their dot product is denoted by (x, y). Given 
a matrix A E and vector x E M"’, ||A|| denotes the spectral norm of A, and ||x|| 

denotes the £2 norm of x, unless stated otherwise. A^, rank(A) and Im(A) represent the 
transpose, rank and image of A respectively. We denote the ith row of a given matrix 
A by Aj, and given a set I C {1,... ,m}. A/ E is the submatrix of A such that 

(A/)j = A/, for any j = 1,... ,\I\. If A is a symmetric matrix, then Amin (A) is its 
minimal eigenvalue. If a matrix A is also invertible, we denote its inverse by A~^. Given 
matrices A E and B E the matrix [A,B] E is their horizontal 

concatenation. Given a point x and a closed convex set X, the distance between x and 
X is denoted by d{x,X) = minygx ||x — y||. The standard unit simplex in R” is denoted 
by = {x E R+ : (l,x) = l} and its relative interior by A+ = {x E R++ : (l,x) = l}. 
Given a set X C R*^, its convex hull is denoted by conv(A). Given a convex set C, the set 
of all its extreme points is denoted by ext(C'). 
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2 Preliminaries 


2.1 Mathematical Preliminaries 

We start by presenting two technical lemmas. The first lemma is the well known descent 
lemma which is fundamental in convergence rate analysis of first order methods. The 
second lemma is Hoffman’s lemma which is used in various error bound analyses over 
polyhedral sets. 

Lemma 2.1 (The Descent Lemma O Proposition A.24]). Let f : —>• M 6 e a continuously 
differentiable funetion with Lipsehitz continuous gradient with constant p. Then for any 
X, y E M” we have 

/(y) < /(x) + (V/(x),y - x) + I jjx - yf 

Lemma 2.2 (Hoffman’s Lemma [l3]). Let X be a polyhedron defined by X = {x G M” : Ax < a}, 
for some A E and a E M”*, and let S' = |x E M” : Ex = e| where E E and 

e E M'’. Assume that A n S 7 ^ 0. Then, there exists a eonstant 9, depending only on A 
and E, sueh that any x E A satisfies 


d(x, A n S) < 0 


Ex — e 


A complete and simple proof of this lemma is given in [121 pg. 299-301]. Defining B 

as the set of all matrices constructed by taking linearly independent rows from the matrix 
T 


E^,A^ 


we can write 9 as 


9 = max 


BgB Amin (BB^) • 


We will refer to 9 as the Hoffman constant associated with matrix 




2.2 Problem’s Properties 

Throughout the article we make the following assumption regarding problem ([P]) . 

Assumption 1. (a) f is continuously differentiable and has a Lipsehitz continuous gradi¬ 

ent with eonstant p. 

(b) g is strongly convex with parameter ag. 

(c) X is a nonempty eompaet polyhedral set given A = {x E M"' : Ax < a} for some 

A E a E M™. 
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We denote the optimal solution set of problem ([P]) by X*. The diameter of the compact 
set X is denoted by D, and the diameter of the set EX (the diameter of the image of X 
under the linear mapping associated with matrix E) by D^. The two diameters satisfy the 
following relation: 

De = max ||Ex — Ey|| < ||E|| max ||x — y|| = ||E|| D, 

X,yex x,yeX 

We define G = maxxgx ||V( 7 (Ex)|| to be the maximal norm of the gradient of g over EX. 
Problem ([P]) possesses some properties, which we present in the following lemmas. 

Lemma 2.3 (Lemma 14.|19jb Let X* be the optimal set of problem ©• Then, there 
exists a constant vector t* and a scalar s* such that any optimal solution x* E X* satisfies 
Ex* = t* and (b,x*) = s*. 

Although the proof of the lemma in the given reference is for polyhedral sets, the 
extension for any convex set is trivial. 

Lemma 2.4. Let f* be the optimal value of problem ©• Then, for any x E X 

/(x) -r<c 

where C = GD^ + ||b|| D. 

Proof. Let x* be some optimal solution of problem (jP]), so that /(x*) = /*. Then for any 
X E X, it follows from the convexity of / that 

/(x)-/(x*)<(V/(x),x-x*) 

= (V 5 (Ex), Ex - Ex*) + (b, X - X*) 

< ||V5(Ex)|| ||Ex - Ex*|| + ||b|| ||x - x*|| 

< GDe + ||b|| D = G 

where the last two inequalities are due to the Cauchy-Schwartz inequality and the definition 
of G,D and De- □ 

The following lemma provides an error bound, i.e., a bound on the distance of any 
feasible solution to the optimal set. This error bound will later be used as an alternative to 
a strong convexity assumption on /, which is usually needed in order to prove a linear rate 
of convergence. This is a different bound than the one given in m, since it relies heavily on 
the compactness of the set X, thus enabling to circumvent the use of the so-called gradient 
mapping. 

Lemma 2.5. For any x E X, 

d(x,X*)2<^(/(x)-/*), 

where k = 9^ ^l|b|| D + 3GDe + , and 9 is the Hoffman constant associated with 

matrix [A"^, E^^, b]^. 
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Proof. Lemma 12.31 implies that the optimal solution set X* can be defined as X* = X n S' 
where S = {x G M” : Ex = t*, (b,x) = s*} for some t* G and s* G M. For any x G X, 
applying Lemma 12.21 with E = [E^,b]^, we have that 

d{^,X*f < 0^(((b,x) - + ||Ex - t*f), (2.1) 

where 9 is the Hoffman constant associated with matrix [A^,E^,b]^. Now, let x G X 
and X* G X*. Utilizing the cr^-strong convexity of g, it follows that 

(V 5 (Ex*),Ex-Ex*) + ^ ||Ex-Ex*f < 5 (Ex) -^(Ex*). (2.2) 

By the first order optimality conditions for problem ([P]), we have (recalling that x G A 
and X* e X*) 

(V/(x*),x-x*) >0. (2.3) 

Therefore, 

^||Ex-t*f < (V/(x*),x-x*) + ^||Ex-Ex*f 

= (V 5 (Ex*), Ex - Ex*) + (b, X - X*) + ^ ||Ex - Ex* f (2.4) 

Now, using (|2.2I) we can continue (12.4p to obtain 

^ ||Ex-t*f < 5 (Ex) -c/(Ex*) + (b,x) - (b,x*) = /(x) - /(x*). (2.5) 

We are left with the task of upper bounding ((b,x) — s*)^. By the definitions of s* and 
/ we have that 

(b, x) — s* = (b, X — X*) 

= (V/(x*),x - X*) - (V 5 (Ex*), Ex - Ex*) 

= (V/(x*), X - X*) - (V 5 (t*), Ex - t*) . (2.6) 

Therefore, using (12.31) . (12.611 as well as the Cauchy-Schwartz inequality, we can conclude 
the following: 


s*-(b,x) < (V 5 (t*),Ex-t*) < \\Vg{t*)\\ ||Ex-t*||. (2.7) 

On the other hand, exploiting (j2.6p . the convexity of / and the Cauchy-Schwartz inequality, 
we also have that 

(b, x) - s* = (V/(x*), X - X*) - (V< 7 (t*), Ex - t*) 
</(x)-/*-(V< 7 (t*),Ex-t*) 

</(x)-/*+ ||V< 7 (t*)||||Ex-t*||. (2.8) 
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Combining (12.71) . (12.81) . and the fact that /(x) — /* > 0, we obtain that 


((b,x) - < (/(x) - r + ||V 5 (t*)|| ||Ex - t*||)2 . 


(2.9) 


Moreover, the definitions of G and De imply ||V( 7 (t*)|| < G, ||Ex —t*|| < and 
since x E X, it follows from Lemma [23] that /(x) — f* < G = GDe + ||b|| D. Utilizing 
these bounds, as well as (12.Sp to bound (12.91) results in 


((b, x) - < (/(x) _ /* + G ||Ex - U 11)2 

= (/(x) - f*f + 2G ||Ex - t* II (/(x) - /*) + G2 ||Ex - t* f 

< (/(x) - nc+ 2 gi?e(/(x) - n+ G'-(/(x) - n 



( 2 . 10 ) 


Plugging (|2.5p and (|2.10l) back into (12.11) . we obtain the desired result: 



□ 


2.3 Conditional Gradient and Linear Oracles 

In order to present the CG algorithm, we first define the concept of linear oracles. 

Definition 2.1 (Linear Oracle). Given a set X, an operator Ox '■ K” X is called a 
linear oracle for X, if for each c E it returns a vector p E X such that (c, p) < (c,x) 
for any x E X, i.e., p is a minimizer of the linear function (c,x) over X. 

Linear oracles are black-box type functions, where the actual algorithm used in order 
to obtain the minimizer is unknown. For many feasible sets, such as ip balls and specific 
polyhedral sets, the oracle can be represented by a closed form solution or can be computed 
by an efficient method. 

The CG algorithm and its variants are linear-oracle based algorithms. The original CG 
algorithm, presented in [8] - also known as the Frank-Wolfe algorithm - is as follows. 
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Conditional Gradient Algorithm (CG) 

Input: A linear oracle Ox 
Initialize: E A 

For k = 1,2,... 

1 . Compute := C>x(V/(x^)). 

2 . Choose a stepsize 7 ^. 

3. Update x^+^ := x^ + 7 ^(p^ — x*^). 


The algorithm is guaranteed to have an O(^) rate of convergence for stepsize determined 
according to exact line search [ 8 ], adaptive stepsize [I5] and predetermined stepsize [ 6 ]. 
This upper bound on the rate of convergence is tight [5] and therefore variants, such as the 
ASCG were developed. 

3 Away Steps Conditional Gradient 

The ASCG algorithm was proposed by Frank-Wolfe in [2D]. A linear convergence rate 
was proven for problems consisting of minimizing strongly convex objective functions over 
polyhedral feasible sets in m under some restrictions on the location of the optimal 
solution, and in [T3] without such restrictions. Jaggi and Lacoste-Julien m showed that 
the latter result is also applicable for the specific case of problem ([P|) where b = 0 (or 
more generally b E Im(E)), provided that an appropriate linear-oracle is available for the 
set EA. In this section, we extend this result for the general case of problem ([P|), i.e., for 
any E and b. Furthermore, we explore the potential issues with obtaining a linear-oracle 
for the set EA, and suggest an alternative analysis, which only assumes existence of an 
appropriate linear-oracle on the original set A. Moreover, our analysis differs from the one 
presented in [TTj by the fact that it is based on duality rather than geometric arguments. 
This approach enables to derive a computable constant for the rate of convergence, which 
is explicitly expressed as a function of the problem’s parameters and the geometry of the 
feasible set. 

We separate the discussion of the ASCG into four sections. In Section 13.11 we define 
the concept of vertex linear oraeles, which is needed for the ASCG method, and the issues 
of obtaining such an oracle for linear transformations of simple sets. Section 13.21 contains a 
full description of the ASCG method itself, including the concept of vertex representation, 
and representation reduction. In Section 13.31 we present the rate of convergence analysis of 
the ASCG for problem ([P|), as well as introduce the new computable convergence constant 
llx- Finally, in Section [3.41 we demonstrate how to compute fix for three types of simple 
sets. 
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3.1 Vertex Linear Oracles 

The ASCG algorithm requires a linear oracle which is a vertex linear oracle, a concept that 
we now define explicitly. 

Definition 3.1 (Vertex Linear Oracle). Given a polyhedral set X with vertex set V, a 
linear oracle Ox ■ V is called a vertex linear oracle for X, if for each c G M"" it 

returns a vertex p G V such that (c, p) < (c,x) for any x G V. 

Notice that, according to the fundamental theorem of linear programming [H Theorem 
2.7], the problem of optimizing any linear objective function over the compact set X always 
has an optimal solution which is a vertex. Therefore, the vertex linear oracle Ox is well 
defined. We also note that in this paper the term “vertex” is synonymous with the term 
“extreme point” 

In |14j . Jaggi and Lacoste-Julien proved that the ASCG algorithm is affine invariant. 
This means that given the problem 


ming(Ex), (3.1) 

xEa 

where <7 is a strongly convex function and E is some matrix, applying the ASCG algorithm 
on the equivalent problem 

mmg{y), (3.2) 

yeY 

where Y = EA, yields a linear rate of convergence, which depends only on the strong 
convexity parameter of g and the geometry of the set Y (regardless of what E generated 
it). However, assuming that E is not of a full column rank, i.e., / is not strongly convex, 
retrieving an optimal solution x* G A from the optimal solution y* G V requires solving 
a linear feasibility problem. This feasibility problem is equivalent to solving the following 
constrained least squares problem: 


• 11 T7' * 11 2 

mm ii/X — y , 

which, for a general E, may be more computationally expensive than simply applying the 
linear oracle on set A. Moreover, in order to apply the algorithm to problem (|3.2p . a 
vertex linear oracle must be available for the set Y = EA. Assuming there exists a vertex 
linear oracle Ox for A, constructing such an oracle Oex for EA may incur an additional 
computational cost per iteration. A naive approach to construct a general linear oracle 
Oex , given Ox, is by the formula 

OEx(c) = Edx(E^c}. (3.3) 

However, the output p = Oex(c) of this linear oracle is not guaranteed to be a vertex of 
EA, and therefore, in order to obtain a vertex linear oracle Oex(c), a vertex p of EA 
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Set X 


Figure 1: The sets X and EX 

Set EX 




with the same objective function value as p must still be found. As an example, take X 
to be the unit box in three dimensions, X = [—1,1]^ C and let E be given by 


E = 


1 1 
1 1 
0 0 



We denote the vertex set V of the set X by the letters A-H as follows: 

A = (1,1, if, B = C = (1,-1,-if, D = (1,-1, if, 

E = (-l,l,lf, F = (-1,-1,if, G=(-l,l,-lf, H = (-1,-1,-if, 

and the linear mappings of these vertices by the matrix E by A’-H’: 

A' = (3,l,2f, B' = (l,3,-2f, C' = G' = (-l,l,-2f, 

F' = (-l,-3,2f, ff' = (-3,-l,-2f, D' = F' = (l,-l,2f. 

The vertex set of EX is ext(EX) = {A', B',F', 

The sets X and EX are presented in Figure 13.11 Notice that hnding a vertex linear 
oracle for X is trivial, while finding one for EX is not. In particular, a vertex linear oracle 
for X may be given by any operator Ox(-) satisfying 

Ox(c) G argmin{(c,x)} = {x G { — 1,1}^ : XiCi = —|cj|, Vi = 1,... ,n} , V c G 

xSF 

(3.4) 
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Given the vector c = (—1, 1, 3)^, we want to find 

p G argmin (c, y) . 

yEext(EX) 

Using the naive approach, described in (|3.3I) . we obtain a vertex of X by applying the 
vertex linear oracle Ox described in (|3.4p with parameter E^c = (0, 0, 1), which may 
return either one of the vertices B, C, G or H. If vertex G is returned, then its mapping C’ 
does not yield a vertex in EX. Therefore, the oracle OejsT must now search for a vertex 
with the same objective function value, or alternatively, discover that G’ lies on the face 
defined by B’ and H’, and consequently return one of these vertices. Obviously, this is true 
for any c such that Ox(E^c) returns one of the vertices G, D, E or G. This 3D example 
illustrates that, even for a simple X, understanding the geometry of the set EX, let alone 
constructing a vertex linear oracle over it, is not trivial and becomes more complicated as 
the dimension of the problem increases. 

We aim to show that given a vertex linear oracle for X, the ASCG algorithm converges 
in a linear rate for problem (|P]). Since in our analysis we do not assume the existence of 
a vertex linear oracle for EX, but rather a vertex linear oracle for X, the computational 
cost per iteration is independent of the matrix E, and depends only on the geometry of X. 

3.2 The ASCG Method 

We will now present the ASGG algorithm. In the following we denote the vertex set of X 
as U = ext(X). Moreover, as part of the ASGG algorithm, at each iteration k the iterate 
is represented as a convex combination of points in V. Specihcally, is assumed to 
have the representation 



vev 


where G ^|V|- = {v G U : > O}, then and provide a compact 

representation of x^, and x^ lies in the relative interior of the set conv(C/^). Throughout the 
algorithm we update and via the vertex representation updating (VRU) scheme. 
The ASGG method has two types of updates: a forward step, used in the classical CG 
algorithm, where a vertex is added to the representation, and an away step, unique to 
this algorithm, in which the coefficient of one of the vertices used in the representation is 
reduced or even nullified. Specifically, the away step uses the direction (x^ — u^) where 
G and step size 7 ^ > 0 so that 

xfc+i = + 7^(x^ - u^) 

= (x^ - + 7 ") + - 7^(1 - 

= (i+^fc)^^v + (/r^,(l+7")-7V, 
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and so — 7^(1 — /^^fc) < Moreover, if 7 ^ = then is nullified, 

and consequently, the vertex is removed from the representation. This vertex removal 
is referred to as a drop step. 

The full description of the ASCG algorithm and the VRU scheme is given as follows. 


Away Step Conditional Gradient algorithm (ASCG) 

Input; A vertex linear oracle Ox 

Initialize: E R where = I, = 0 for any v E R/ {x^j and = {x^} 

For k = 1,2,... 


1 . 

2 . 

3. 


Compute := Ox(V/(x^)). 

Compute E argmax (V/(x^), v). 
vec/'' 

If (V/(x^),p^-x*^> < (V/(x^),x*^-u 
Otherwise, set d^ ;= x^ — n*^ and 7 ^ ;= 





k 


k 


x*^ and 7 ^ ;= 1 . 


4. Choose a stepsize 7 ^. 

5. Update x^+^ := x^ + 7 ^d*^. 

6 . Employ the VRU procedure with input (x^, U^, d^, 7 ^, p^, v^) and obtain an 

updated representation . 


The stepsize in the ASCG algorithm can be chosen according to one of the following 
stepsize selection rules, where d^ and 7 ^ are as defined in the algorithm. 
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k 



argmin /(x^ + 7 d^) 

0 < 7 < 7 '' 


mm 


(V/(xO,d>-) . 

pikuTp ’ 


Exact line search 


E argmin <7 (V/(x^), d^) + 7^2 11^^ 

0 < 7 < 7 '' ^ 


' I Adaptive m- 


(3.5) 


Remark 3.1. It is simple to show that under the above two choice of stepsize strategies, 
the sequence of function values {/(x^)}fc>i is nonincreasing. 

Since the convergence rate analyses for both of these stepsize options is similar, we 
chose to conduct a unified analysis for both cases. Eollowing is exact definition of the VRU 
procedure. 
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Vertex Representation Updating (VRU) Procedure 
Input: - current point. 

fjL^) - vertex representation of x^, 
d^, 7 ^ - current direction and stepsize, 
p^,v^ - candidate vertices. 

Output: Updated vertex representation of x*^+^ = x^ + 7 ^d*^. 

If d^ = x^ — (away step) then 

1. Update := + 7 ^) for any v E / {u*^}. 

2. Update:= (1 + 7 ^) - 7 ^ 

3. If = 0 (drop step), then update := / {u^}, otherwise := . 

Else (d^ = p*^ — x^ - forward step) 

1. Update := ^^{1 — 7 ^) for any v E / {p*^}- 

2. Update (1 _ 

3. If = 1, then update = {p^}, otherwise update := U {p*^}. 

Update := 7^(C/*^+^, p^+^) with TZ being a representation reduction pro¬ 

cedure with constant N. 

The VRU scheme uses a representation reduction procedure TZ with constant V, which is a 
procedure that takes a representation ([/, p) of a point x and replaces it by a representation 
(U,p) of X such that U O U and \U\ < N. We consider two possible options for the 
representation reduction procedure: 

1 . 7^ is the trivial procedure, meaning it does not change the representation, in which 
case its constant is V = |U|. 

2. The procedure TZ is some implementation of the Caratheodory theorem |18( Section 
17], in which case its constant is V = n -|- 1. Using this option will accelerate the 
algorithm when the number of vertices is not polynomial in the problem’s dimen¬ 
sion. A full description of the incremental representation reduction (IRR) scheme, 
which applies the Caratheodory theorem efficiently in this context, is presented in 
Appendix 0 

3.3 Rate of Convergence Analysis 

We will now prove the linear rate of convergence for the ASCG algorithm for problem (iPll . 
In the following we use /(x) to denote the index set of the active constraints at x, 

I(x) = {i E {1,... ,n} : AjX = aj . 
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Similarly, for a given set U, the set of active constraints for all the points in U is defined 
as 

I{U) = {i € {1, ... ,n} : AjV = a*, Vv G f7} = Q /(v). 

veu 

We present the following technical lemma, which is similar to a result presented by 
Jaeei and Lacoste-Julien |14pl. In m the proof is based on geometrical considerations, 
and utilizes the so-called “pyramidal width constant”, which is the optimal value of a 
complicated optimization problem, whose value is unknown even for simple sets such as 
the unit simplex. In contrast, the proof below relies on simple linear programming duality 
arguments, and in addition, the derived constant IIx) which replaces the pyramidal width 
constant, is computable for a many choices of sets X. 


Lemma 3.1. Given [/Cl/ and c G If there exists a z G R”" such that < 0 

and (c, z) > 0, then 


max (c, p 
pec,use/ 


u) > 


Qx (c, z) 
\U\ ||z|| 


where 


for 


Qx = 


c 


(3.6) 


C = min (a* - Ajv), 

ve:V,ie{l,...,m}:ai>AiV 

if = max II Adi . 


Proof. By the fundamental theorem of linear programming m, we can maximize the 
function (c,x) on X instead of on V and get the same optimal value. Similarly, we can 
minimize the function (c, y) on conv([/) instead of on [/, and obtain the same optimal 
value. Therefore, 


max (c, p — u) 
peF.ueC 


max (c, p) 
pec 

max (c, x) 
xex 


— min (c, u) 
uef/ 

- min (c, y) 

yEconv((y ) 


max (c, x)+ max {— (c,y)}. 

x:Ax<a yGconv(t/) 


(3.7) 


Since X is nonempty and bounded, the problem in x is feasible and bounded above. 
Therefore, by strong duality for linear programming, 


max (c, x) = min (a, rj) . 
x:Ax<a ri&R’]^-.A'^ri=c 


(3.8) 


^This was done as part of the proof of [H Lemma 6], and does not appear as a separate lemma. 
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Plugging (|3.8p back into (13.7p we obtain: 


max (c, p — u) 
pen,use/ 


min (a, 77 )+ max {—(c,y)} 

r;SlRip:A^r;=c ygconv(C/) 

min max (a — Ay, ij) . 

r7eK!p:A^r7=c yeconv(U) 


Since y = JJveu ^ ™ conv(C/), we have that 

max (a — Ay, r/) > (a — Ay, tj) 

ySconv(C/) 

for any value of rj, and therefore, 

min max (a — Ay, 77 ) > min (a — Ay, 77 ) . 

r;SlRip:A^r;=c ySconv(t/) rj£M^:A'^ri=c 


Using strong duality on the RHS of (|3.10p . we obtain that 

min (a — Ay, 77 ) = max {(c, x) : Ax < (a-Ay)}. 

Tj£M^:A^rj=c x 


(3.9) 


(3.10) 


(3.11) 


Denote J = I{U) and J = {1,... ,777} / J. From the definition of I{U), it follows that 

aj - Ajv = 0 (3.12) 

for all V E [/, and that for any i G J there exists at least one vertex v E 1/ such that 
ai — AjV > 0, and hence, 


Oj — AjV > min {aj — Aju) = ^ > 0 , 

uSFjS{l,...,m}:aj>AjU 

which in particular implies that 

^(oj - Ajv) > C > 0. 
v£U 

Since y E conv([/), we can conclude from (I3.12p and (13.131) that 


aj - Ajy = 0 

aT-A;,y = Ag,a„_A^v)>l^. 


(3.13) 


(3.14) 


Therefore, replacing the RHS of the set of inequalities Ax < (a — Ay) in (13.1111 by the 
bounds given in (I3.14p . we obtain that 


max{(c, x) : Ax < (a-Ay)} > max 

X X 


(c, x) : Ajx < 0, A-tx < 1 


\U\ 


(3.15) 
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Combining (I3.9p . (l3.10p . (13.1111 and (13.151) it follows that 


max (c,p — u)>Z*, (3.16) 


where 


Z* = max 

X 


(c,x) : Ajx < 0, Ajx < 1|^ 


(3.17) 


We will now show that it is not possible for z to satisfy Ajz < 0. Suppose by contradiction 
z satishes does satisfy Ajz < 0. Then Xq, = az is a feasible solution of problem (13.1711 for 
any a > 0, and since (c, z) > 0 we obtain that (c, x^) ^ oo as a —>■ oo, and thus Z* = oo. 
However, since V contains a finite number of points, the LHS of (j3.16l) is bounded from 
above, and so Z* < oo in contradiction. Therefore, there exists i G J such that AjZ > 0. 
Since z 7 ^ 0, the vector x = is well dehned. Moreover, x satisfies 


Ajx 


A\u\ 


Ajz < 0, 


(3.18) 


and 

A^x = AjZ < IIA^II ||z|| ^ ViGJ, (3.19) 

where the first inequality follows from the Cauchy-Schwartz inequality and the second 
inequality follows from the fact that if z G J, then i ^ I{V) and so || Aj|| < ip. Consequently, 
(13.181) and (13.1911 imply that x is a feasible solution for problem (I3.17p . Therefore, Z* > 
(c,x), which by (I3.16p yields 


max (c, p — u) > (c,x) 
pev.uec/ 


Qx (c,z) 
\U\ ||z|| 


□ 

The constant Qx represents a normalized minimal distance between the hyperplanes 
that contain facets of X and the vertices of X which do not lie on those hyperplanes. We 
will refer to Hx as the vertex-facet distance of X. Examples for the derivation of Hx for 
some simple polyhedral sets can be found in Section 13.41 

The following lemma is a technical result stating that the active constraints at a given 
point are the same as the active constraints of the set of vertices in its compact represen¬ 
tation. 

Lemma 3.2. Let x G A and the setU satisfy x = /^vV, where G Aj^|. Then 

/(x)=/(H). 
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Proof. It is trivially true that I{U) C I(x) since x is a convex combination of points in 
the affine space defined by {y : We will prove that /(x) C I{U). Any 

V G [/ C A satisfies < a/(x)- Assume to the contrary, that there exists i G /(x) 

such that some u G satisfies AjU < Oj. Since > 0 and = I, it follows that 

AjX = ^ //vAiV < ^ = CLi, 

veu v£U 

in contradiction to the assumption that i G /(x). □ 

Corollary 3.1. For any x G XjX* which can be represented as x = Ylveu some 

pi G Aj^i and U CV, it holds that, 


max (Vf(x), u — p) > -—- max 
uec/.pev' ^ ^ ~ \U\ x*ex* 


(V/(x),x-x*) 
llx - x*|| 


Proof. For any x G XjX* define c = —V/(x). It follows from Lemma[3]2]that /(C/) = l(x). 
For any x* G X* , the vector z = x* — x satishes 


A/([7)Z 

and, from the convexity of /, as well as the optimality of x*, (c,z) = — (V/(x),x* — x) > 
/(x) — /(x*) > 0. Therefore, invoking Lemma l3.II achieves the desired result. □ 


We now present the main theorem of this section, which establishes the linear rate of 
convergence of ASCG for problem (|P]). This theorem is an extension of m Thorem 7], 
and the proof follows the same general arguments, while incorporating the use of the error 
bound from Lemma 12.51 and the new constant fix• 


Theorem 3.1. Let {x^}fc>i be the sequence generated by the ASCG algorithm for solving 
problem © using a representation reduction to procedure TZ with constant N, and let f* 
be the optimal value of the problem. Then for any k>l 

/(x^) - r < C{1 - (3.20) 


where 


= min 


i^x? n 

SpuD'^N'^ ’ 2 J ’ 


(3.21) 


k = 6^ (^||b|| D + 3GL»e + 


with 6 being the Hoffman constant associated with ma¬ 


trix [A^,E^,b]^, C = GDe + ||b|| D, and Llx is the vertex-facet distance of X given in 

m\)- 
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Proof. For each k we will denote the stepsize generated by exact line search as 7 g and the 
adaptive stepsize as 7 ^. Then 

/(x^ + 7 e'^d") < /(x"+i) < /(x" + 7 ,M^). (3.22) 

From Lemma 12.11 (the descent lemma), we have that 

/(x" +7,M") < /(x'^) +7,"(V/(x"),d") + (3.23) 

Assuming that x^ ^ X*, then for any x* G X* we have that 

(V/(x"), d") = min {(V/(x"), p" - x"), (V/(x'^), x" - u^)} 

<(V/(x"),p^-x") 

<(V/(x"),x*-x") 

< r - /(x"), (3.24) 


where the first equality is derived from the algorithm’s specific choice of d^, the third line 
follows from the fact that p^ = Ox(V/(x*’)), and the fourth line follows from the convexity 
of /. In particular, d^ / 0, and by (13.5p it follows that 7 ^ is equal to 


k 

la 


= mm 


(V/(xfe),d^) 

plld'^P 


(3.25) 


We now separate the analysis to three cases: (a) d^ = p^—x^ and 7 ^ = 7 ^, (b) d^ = x^— 
and 7 ^ = 7 ^, and (c) 7 ^ < 7 ^. 

In cases (a) and (b), it follows from (I3.25h that 

7 Vl|d"f <-(V/(xp,dp. (3.26) 


Using inequalities ()3.22l) . ()3.23l) and ()3.26p . we obtain 


/(x"+P < /(xp +7„pV/(x"),dp + l^Wd^f 

■^k 


</(xp + ^(V/(xp,d'=). 

Subtracting f* from both sides of the inequality and using ^3.2411 . we have that 




^(xfc+1) _f*< /(xp + ^(v/(x^), dp 


<(/(xp-r) 1- 


jk 


(3.27) 
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In case (a), 7 ^ = 1, and hence 


y(xfc+i) -(3.28) 

In case (b), we have no positive lower bound on 7 ^, and therefore we can only conclude, 
by the nonnegativity of 7 ^, that 

/(x"+i)-r </(x")-r. 


However, case (b) is a drop step, meaning in particular that < \U^ \ — 1, since before 

applying the representation reduction procedure TZ, we eliminate one of the vertices in the 
representation of x^. Denoting the number of drop steps until iteration k as and the 
number of forward steps until iteration k as it follows from the algorithm’s definition 
that < k — 1 (at each iteration we add a vertex, remove a vertex, or neither) and 

(the number of removed vertices can not exceed the number of added vertices), 
and therefore < {k — l)/ 2 . 

We arrive to case (c). In this case, (|3.25l) implies 

(V/lx*),#) 
p||d‘||2 ’ 

which combined with (j3.22p and (|3.23p results in 


/(x^+i) < /(x^) + 7 ,'(V/(x^),d^) + = /(x^) _ (3.29) 


From the algorithm’s specific choice of d^, we obtain that 

0 > (V/(x*’), p*’ - u*’) = (V/(x*’), p*’ - + (V/(x*’), x^ - u*’) 

>2(V/(x^),d^). 


(3.30) 


Applying the bound in ()3.30p and the inequality 


d*’|| < D to ()3.29p . it follows that 


/(x^+i) < /(x'^) 


(V/(x"),d")2 

2p\\dT 


< /(x") 


(V/(x'=),p^-u'=)2 

8pD^ 


(3.31) 


By the definitions of and p^, and since applying representation reduction procedure TZ 
ensures that that \U^\ < N, Corollary 13.11 implies that for any x* G A*, 


(V/(x"),u"-p") 


max (V/(x^),u 


P) > 


Dx (V/(xfe),x^-x*) 
N llx^ — X* II 


(3.32) 
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Lemma [231 implies that there exists x* E X* such that ||x*^ — x*|p < k(/(x^) — /*), which 
combined with convexity of /, bounds (|3.32l) from below as follows: 


(V/(x''),u'=-p^)2> 


(V/(x"),x"-x*)' 
N 


|xfc _ X* 


> 


> 


N 


nxY {f{Y)-r) 


|xfc _ X* 11^ 
* \ 2 


N ) K{f{Y)-f*) 






-(/(x")-r), 


which along with (j3.31jl yields 


J(xfc+1) _ ;* < _ 


(V/(x"),u"-p' 


k\2 


< if{Y) - r) 1 


8pD‘^ 

i^x)^ \ 


SpnD'^N^ J 

Therefore, if either of the cases (a) or (c) occurs, then by (j3.28jl and (I3.33|] . it follows that 

^(xfc+i) _/*<(!_ Y)ifiY) - n, (3.34) 

where Y is defined in ()3.2ip . We can therefore conclude from cases (a)-(c) that until 
iteration k we have at least iterations for which (13.341) holds, and therefore 




(3.35) 


Applying Lemma 12.41 for x = x^ we obtain /(x^) — f* < C, and the desired result (I3.20p 
follows. □ 


3.4 Examples of Computing the Vertex-Facet Distance fix 

In this section, we demonstrate how to compute the vertex-facet distance constant fix for 
a few simple polyhedral sets. We consider three sets: the unit simplex, the ii ball and the 
ioo ball. We first describe each of the sets as a system of linear inequalities of the form 
A = {x : Ax < a}. Then, given the parameters A and a, as well as the vertex set V, fix 
can be computed by its definition, given by (13.61) . 

The unit simplex. The unit simplex A„ can be represented by 


A = 

Inxn 

-*-n 

E a = 

On' 

1 

E 

(3.36) 


1 

tH 

1 

_1 


1 
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The set of extreme points is given hy V = Notice that since there are only 

n extreme points which are all affinely independent, using a rank reduction procedure 
which implements the Caratheodory theorem is the same as applying the trivial procedure 
that does not change the representation. In order to calculate ilx, we first note that 
I{V) = {n + l,n + 2}, and therefore 

ip = max ||Aj|| = max ||ej|| = 1 


and 


C = (sijv) = min ||ej|| = 1, 


= i = 

The ii ball. The ii ball is given by the set 


which means that flv = — = 1. 

Lp 


X = 


X G 


E 

2=1 


Xi\ < \ 


{x gM” : (w,x) < l,Vw G {-1,1}”}. 


Therefore a = 1 G and each row of the matrix A G is a vector in {—1,1}”. The 

set of extreme points is given hy V = |J {—e*}”^^, and therefore has cardinality of 

|y| = 2n. 

Finally, we have that 

ip = max ||Aj|| = ^/n 


and 

C, = min (1 —(v,w)) 

= min (l-(ei,w)) 

= min (1 + Itcd) = 2, 


which means that flv = — = 

^ V Vn 

The ioo ball. The ioo ball is represented by 


A = 



G R 


2 nxn 


a = 


1 

1 


G R2”. 


(3.37) 


The set of extreme points is given hy V = { — 1,1}”, which in particular implies that 
|y| = 2”'. Therefore, for large-scale problems, using the representation reduction proce¬ 
dure, which is based on Caratheodory theorem, is crucial in order to obtain a practical 
implementation. 
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From the definition of A and V, it follows that 


(f = max ||Aj|| = max ||ej|| = 1 
2G{l,...,2n} 

and 

C= min (1 “ (e*, v)) = 2, 

ie{l,...,n}, ve{— 

which implies that fix — ^ 
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Appendix A Incremental Representation Reduction using 
the Caratheodory Theorem 

In this section we will show a way to efficiently and incrementally implement the construc¬ 
tive proof of Caratheodory theorem, as part of the VRU scheme, at each iteration of the 
ASCG algorithm. We note that this reduction procedure does not have to be employed, 
and instead the trivial procedure, which does not change the representation can be used. 
In that case, the upper bound on the number of extreme points in the representation is 
just the number of extreme points of the feasible set X. 

The implementation described in this section will allow maintaining a vertex represen¬ 
tation set U^, with cardinality of at most n-|-l, at a computational cost of 0{n^) operations 
per iteration. Eor this purpose, we assume that at the beginning of iteration k, has a 
representation with vertex set = {v^,..., v^} C V, such that the vectors in the set are 
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affinely independent. Moreover, we assume that at the beginning of iteration fc, we have at 
our disposal two matrices G and G We define G to 

be the matrix whose ith column is the vector w* = for i = 1, where 

is called the reference vertex. The matrix is a product of elementary matrices, which 
ensures that the matrix is in row echelon form. The implementation does 

not require to save the matrix V^, and so at each iteration, only the matrices and 
are updated. 

Let be the vertex set and let be the coefficients vector at the end of iteration 
k, before applying the rank reduction procedure. Updating the matrices and 

as well as and is done according to the following Incremental Representation 

Reduction scheme, which is partially based on the proof of Caratheodory theorem presented 
in [THl Section 17]. 


Incremental Representation Reduction (IRR) 

Input: Representation of point set = | v^, ..., of affinely 

independent vectors, and matrices G and G 

Output: Updated representation of and matrices G 

and G R”^(l^fe+il-i). 


1. Set L := \U^\. 

2. Update := 

3. If = 1, then set the matrix to be empty and := I. 

4. Else, if I [7*^+11 = L, then set := 

5. Else, if = L — 1 > 1 (drop step), then 

(a) Eind i* G {1,... ,L} such that v** G 

(b) If U = 1 (the reference vertex was removed), then remove the first column 
of and change reference vertex to v^, using the update formula 




0 I 


1 T 


(L_1)x(L-1) 


+ T"(v1-v2)1^ 


where 1, 0 G R^ 

(c) Else (a non-reference vertex was removed), remove column i* — 1 from . 

6. Else, if = L -|- 1 (forward step), then 

(a) Eind G U^+^IU^. 
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(b) Compute := — v^. 

(c) Update the matrix := [W^, T^'w'^]. 

(d) Compute M - the row rank of 

(e) If L > M, then 

i. Find a solution A of the following system 

W^+U = 0 , \l = - 1 . 

ii. Set the vector A G to be 

i = i 

Aj_i i = 2,..., L + 1 

hi. Compute a ;= min^.^.^Q —^ and a := max^.^.^Q —^ and set 

{ a Ai > 0 
a Ai < 0. 

iv. Update := + a\i for alH = 1,..., L + 1. 

V. Compute / = |i G {1,..., L + 1} : = o|. 

vi. For each i £ I remove column i — 1 matrix 
vii. Update = [7^+^/ {vjjjgj-. 

7. If is not in row echelon form, then construct a matrix T, as a composition of 

elementary matrices, such that is row echelon form, and update := 

and := TT^+b 


Notice that in order to compute the row rank of the matrix in step |6(d)[ we 

may simply convert the matrix to row echelon form, and then count the number of nonzero 
rows. This is done similarly to step[3 and requires ranking of at most one column. We will 
need to rerank the matrix in step [7] only if L > M, and subsequently at least one column 


is removed in step 6(e)vi 


The IRR scheme may reduce the size of the input only in the case of a forward 

step, since otherwise the vertices in are all affinely independent. Nonetheless, the 

IRR scheme must be applied at each iteration in order to maintain the matrices and 

rp/c 
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The efficiency of the scheme relies on the fact that only a small number of vertices 
are either added to or removed from the representation. The potentially computationally 


expensive steps are: step 

5(b) - replacing the reference vertex, step 6(d) 

- finding the row 

rank of step 6(e)i 

- solving the system of linear equalities, step 

3(e)vi - removing 


columns corresponding with the vertices eliminated from the representation, and step [7] - 
the ranking of the resulting matrix Step |5(bJ] can be implemented without explicitly 

using matrix multiplication and therefore has a computational cost of O(n^). Since 
was in row echelon form, step |6(d)] requires a row elimination procedure, similar to step[7l 
to be conducted only on the last column of which involves at most 0(n) operations 

and an additional O(n^) operation for updating Moreover, since was full column 

rank, the IRR scheme guarantees that in step 6(e)i the vector A has a unique solution, and 
since is in row echelon form, it can be found in O(n^) operations. Moreover, in step 

6(e)vi, the specific choice of a ensures that the reference vertex is not eliminated from 


the representation, and so there is no need to change the reference vertex at this stage. 
Furthermore, it is reasonable to assume that the set I satisfies |/| = 0(1), since otherwise 
the vector produced by a forward step, can be represented by significantly less vertices 
than x^, which, although possible, is numerically unlikely. Therefore, assuming that indeed 
|I| = 0(1), the matrix T, calculated in step [71 applies a row elimination procedure to at 
most 0(1) rows (one for each column removed from W^"*"^) or one column (if a column 
was added to Conducting such an elimination on either row or column takes at 

most O(n^) operations, which may include row switching and at most n row addition and 
multiplication. Therefore, the total computational cost of the IRR scheme amounts to 
O(n^). 
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